perm filename DBL4.TEX[PEG,DBL]1 blob
sn#476089 filedate 1979-09-18 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00025 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00004 00002 \input cmpdes
C00005 00003 \chapbegin{4} % Here beginneth Chapter 4
C00008 00004 \sectionbegin[1]{Syntax of the Heuristics}
C00011 00005 \subsectionbegin[1.1]{Syntax of the Left-hand Side}
C00018 00006 \subsectionbegin[1.2]{Syntax of the Right-hand Side}
C00021 00007 \sectionbegin[2]{Heuristics Suggest New Tasks}
C00022 00008 \subsectionbegin[2.1]{An Illustration: ``Fill in Generalizations of Equality''}
C00032 00009 \subsectionbegin[2.2]{The Ratings Game}
C00044 00010 \sectionbegin[3]{Heuristics Create New Concepts}
C00046 00011 \subsectionbegin[3.1]{An Illustration: Discovering Primes}
C00057 00012 \subsectionbegin[3.2]{The Theory of Creating New Concepts}
C00063 00013 \subsectionbegin[3.3]{Another Illustration: Squaring a number}
C00070 00014 \sectionbegin[4]{Heuristics Fill in Entries for a Facet}
C00075 00015 \subsectionbegin[4.1]{An Illustration: ``Fill in Examples of Set-union''}
C00083 00016 \subsectionbegin[4.2]{Heuristics Propose New Conjectures}
C00088 00017 \subsectionbegin[4.3]{An Illustration: ``All primes except 2 are odd''}
C00097 00018 \subsectionbegin[4.4]{Another illustration: Discovering Unique Factorization}
C00108 00019 \sectionbegin[5]{Gathering Relevant Heuristics}
C00110 00020 \subsectionbegin[5.1]{Domain of Applicability}
C00120 00021 \subsectionbegin[5.2]{Rippling}
C00126 00022 \subsectionbegin[5.3]{Ordering the Relevant Heuristics}
C00134 00023 \sectionbegin[6]{{\AM}'s Starting Heuristics}
C00135 00024 \subsectionbegin[6.1]{Heuristics Grouped by the Knowledge They Embody}
C00141 00025 \subsectionbegin[6.2]{Heuristics Grouped by How Specific They Are}
C00144 ENDMK
C⊗;
\input cmpdes
\input ammacs
\def\draft{T}
\titlepage
\newpage
\setcount0 35
\parttitle{AM: DISCOVERY IN MATHEMATICS AS HEURISTIC SEARCH}
\baselineskip 12pt
\chapbegin{4} % Here beginneth Chapter 4
\chaptertitle{HEURISTICS}
\rjustline{\:B Heuristics}
\runningrighthead{INTRODUCTION}
\tenpoint
\vskip 14pc
\noindent Assume that somehow {\AM} has selected a particular task from the
agenda---say ``{\bf Fill-in Examples of Primes}.'' What precisely does {\AM} do,
in order to execute the task? How {\sl are} examples of primes filled
in?
The answer can be compactly stated as follows:
\ctrline{\bf {\AM} selects relevant heuristics, and executes them.}
\yskip
This really just splits our original question into two new ones: ({\it i\/})
How are the relevant heuristics selected, and ({\it ii\/}) What does it mean
for heuristics to be executed (\eg, how does executing a heuristic
rule help to fill in examples of primes?).
These two topics (in reverse order) are the two major subjects of
this chapter. Although several examples of heuristics will be given,
the complete list is relegated to Appendix 2.\ffootnote{There
they are condensed and phrased in English. The reader wishing to see
examples of the heuristics as they actually were coded in LISP should
glance at Appendix 3.2.}\par
The first section explains what heuristic rules look like (their
``syntax,'' as it were). The next three sections illustrate how they
can be executed to achieve their desired results (their ``semantics'').
Section 4--5 explains where the rules are stored and how they are
accessed at the appropriate times.
Finally, the initial body of heuristics is analyzed.
The informal knowledge they contain is categorized and described.
Unintentionally,
the distribution of heuristics among the concepts is quite nonhomogeneous;
this too is described in Section 4--6.
\sectionskip
\sectionbegin[1]{Syntax of the Heuristics}
Let's start by seeing what a heuristic rule looks like. In
general (see [Davis & King 75] for historical references to production rules),
it will have the form
{\parindent 45pt \parskip 1pt
If {\sl <situational fluent>}
Then {\sl <actions>}
}
\parindent 20pt
As an illustration, here is a heuristic rule, relevant when checking examples
of anything:
\han2{{\it If the current task is to Check Examples of any concept X,}}
\han3{{\it and (Forsome Y) Y is a generalization of X,}}
\han3{{\it and Y has at least 10 examples,}}
\han3{{\it and all examples of Y are also examples of X,}}
\han2{{\it Then print the following conjecture: \sl `` X is really no more specialized
than Y,''}}
\han3{{\it and add it to the Examples facet of the concept named ``Conjectures,''}}
\han3{{\it
and
add the following task to the agenda:
{\bf ``Check examples of Y''}, for the reason:
``Just as Y was no more general than X, one-of Generalizations(Y) may
turn out to be no more general than Y,'' with a rating for that reason
computed as the average of $\|$Examples(Generalizations(Y))$\|$,
$\|$Examples(Y)$\|$, and Priority(Current task).}}
\yskip
\noindent As with production rules,
and formal grammatical rules, each of {\AM}'s heuristic rules
has a left-hand-side and a right-hand-side.
On the left is a test to see whether the rule is applicable, and on the right
is a list of actions to take if the rule applies.
The left-hand-side will also be called the IF-part, the predicate, the
preconditions, left side, or the situational fluent of the rule.
The right-hand-side will sometimes be referred to as the THEN-part, the
response, the right side, or the actions part of the rule.
\subsectionbegin[1.1]{Syntax of the Left-hand Side}
The situational fluent (left hand side)
is a LISP predicate, a function which always
returns True or False (in LISP, it actually returns either the atom T
or the atom NIL). This predicate may investigate facets of any
concept (often merely to see whether they are empty or not), use the
results of recent tests and behaviors (\eg, to see how much cpu time
{\AM} spent trying to work on a certain task), etc.
The left side is a conjunction of the
form $P↓1 \meet P↓2 \meet \ldots$. All the
conjuncts, except the very first one, are arbitrary LISP predicates.
They are only constrained to obey two commandments:
\listbegin
\hddlistentry[Be quick!]{(return either True or False in under 0.1 cpu seconds).}
\hddlistentry[Have no side effects!]{(destroying or creating list structures
or Lisp functions, resetting global variables).}
\listend
Here are some sample conjuncts that might appear inside a left-hand
side (but {\sl not} as the very first conjunct):
{\it\listbegin
\bullistentry{More than half of the current task's time quantum is already
exhausted,\ellipsis}
\bullistentry{There are some known examples of Structures,\ellipsis}
\bullistentry{Some generalization of the current concept (the concept
mentioned as part of the current task) has an empty Examples
facet,\ellipsis}
\bullistentry{The space quantum of the current task is gone, but its time
allocation is less than 10{\rm\%} used up,\ellipsis}
\bullistentry{A task recently selected had the form {\bf ``Restructure facet F
of concept X''}, where F is any facet, and X is the current
concept,\ellipsis}
\bullistentry{The user has used this system at least once before,\ellipsis}
\bullistentry{It's Tuesday,\ellipsis}
\listend}
The very first conjunct of each left-hand side is special. Its syntax
is highly constrained. It specifies the domain of applicability of
the rule, by naming a particular facet of a particular concept to which
this rule is relevant.
{\AM} uses this first conjunct as a fast ``pre-precondition,'' so that the
only rules whose left-hand sides get evaluated are already known to
be somewhat relevant to the task at hand. In fact, {\AM} physically
attaches each rule to the facet and concept mentioned in its first
conjunct.\ffootnote{Sometimes, I will mention where a certain rule is
attached; in that case, I can omit explicit mention of the first
conjunct. Conversely, if I include that conjunct, I needn't tell you
where the rule is stored.}\par
Here are two typical examples of allowable
first conjuncts:
{\it\listbegin
\bullistentry{The current task (the one last selected from the agenda) is of
the form {\bf ``Check the Domain/range facet of concept X''}, where X is
any operation}
\bullistentry{The current task is of the form {\bf ``Fillin the examples facet
of the Primes concept''}}
\listend}
These are the only guidelines which the left-hand side of a heuristic
rule must satisfy. Any LISP predicate which satisfies these
constraints is a syntactically valid left-hand side for a heuristic
rule.
It turned out later that this excessive freedom made it difficult for
{\AM} to inspect and analyze and synthesize its own heuristics; such a
need was not foreseen at the time {\AM} was designed.
Because of this freedom, there is not much more to say about the
left-hand sides of rules. As the reader encounters heuristics in the
next few sections, he should notice the (unfortunate) variety of
conjuncts which may occur as part of their left-hand sides.
\subsectionbegin[1.2]{Syntax of the Right-hand Side}
``Running'' the left-hand-side means evaluating the series of conjoined
little predicates there, to see if they all return True. If so, we
say that the rule ``triggers.'' In that case, the right-hand-side is
``run,'' which means executing all the actions specified there. A
single heuristic rule may have a list of several actions as its
right-hand-side. The actions are executed in order, and we then say
the rule has finished running.
Only the right-hand-side of a heuristic rule is permitted to have
side effects. The right side of a rule is a series of little LISP
functions, each of which is called an {\sl action}.
Semantically, each action performs some processing which is
appropriate in some way to the kinds of situations in which the
left-hand-side would have triggered. The final value that the action
function returns is irrelevant.
Syntactically, there is only one constraint which each function or
``action'' must satisfy: Each action has one of the following 3
side-effects, and no other side-effects:
\listbegin
\bullistentry{It suggests a new task for the agenda.}
\bullistentry{It causes a new concept to be created.}
\bullistentry{It adds (or deletes) a certain entry to a particular
facet of a particular concept.}
\listend
\noindent To repeat: the right side of a rule contains a list of actions, each
of which is one of the above three types. A single rule might thus
result in the creation of several new concepts, the addition of many
new tasks to the agenda, and the filling in of some facets of some
already-existing concepts.
These three kinds of actions will now be discussed in the following
three sections.
\sectionskip
\sectionbegin[2]{Heuristics Suggest New Tasks}
This section discusses the ``proposing a new task'' kind of action.
Here is the basic idea in a nutshell: The left-hand-side of a rule triggers.
Scattered among the ``things to do'' in its right-hand-side are
some suggestions for future tasks. These
new tasks are then simply added to the agenda list.
\subsectionbegin[2.1]{An Illustration: ``Fill in Generalizations of Equality''}
If a new task is suggested by a heuristic rule, then that rule must
specify how to assemble the new task, how to get reasons for it, and
how to evaluate those reasons. For example, here is a typical
heuristic rule which proposes a new task to add to the agenda. It says
to generalize a predicate if it is very rarely\ffootnote{The most
suspicious part of the situational fluent (the IF-part) is the
number ``.05.'' Where
did it come from? Hint: if all humans had f fingers per hand,
h hands, t toes per foot, and F feet, this would probably be
1/(fh+tF). Seriously, one can change this value (to .01 or to .25)
with virtually no change in {\AM}'s behavior. This is the conclusion of
experiment 3 (see Section 6.2.3). Such empirical
justification is one
important reason for actually writing and running large programs like {\AM}.}
satisfied:
\han2{{\it If the current task was (Fill-in examples of X),}}
\han3{{\it and X is a predicate,}}
\han3{{\it and more than 100 items are known in the domain of X,}}
\han3{{\it and at least 10 cpu seconds were spent trying to randomly instantiate X,}}
\han3{{\it and the ratio of successes/failures is both >0 and less than .05}}
\han2{{\it Then add the following task to the agenda:
{\bf (Fill-in generalizations of X)},}}
\han3{{\it for the following reason:}}
\han3{{\it ``X is rarely satisfied; a less restrictive concept might be
more interesting.''}}
\han3{{\it This reason's rating is computed as three times
the ratio of non\-ex\-am\-ples/ex\-am\-ples found.}}
\noindent Even this is one full step above the actual LISP implementation,
where ``{\bf X is a predicate}'' would be coded as ``{\bf (MEMBER X (EXAMPLES
PREDICATE))}.'' The function {\bf EXAMPLES(X)} rummages about looking
for already-existing examples of X.
Also, the LISP code contains information for normalizing all the numbers
produced, so that they will lie in the range 0--1000.
Let's examine an instance of where this rule was used. At some point,
{\AM} chose the task ``{\bf Fillin examples of List-equality}.'' One of the
ways it filled in examples of this predicate was to run it on pairs
of randomly-chosen lists, and observe whether the result was True or
False.\ffootnote{The True ones became examples of List-equality, and the pairs
of lists which didn't satisfy this predicate became known as
non-examples (failures, foibles,\ellipsis). A heuristic similar to this
``random instantiation'' one is illustrated in Section 4--3.8.}
Say that 244 random pairs of lists were tried, and only twice was
this predicate satisfied. Sometime later, the IF part
of the above heuristic is examined. All the conditions are met, so it
``triggers.''
For example, the ``ratio of successes to failures'' is just 2/242, which is
clearly greater than zero and less than 0.05.
So the right-hand side (THEN-part) of the above rule is executed.
The right-hand side initiates only one action: the task
``{\bf Fillin generalizations of List-equality}'' is added to the agenda,
tagged with the reason ``List-equality is rarely satisfied; a slightly
less restrictive concept might be more interesting,'' and that reason
is assigned a numeric rating of $3\times (242/2) = 363$.
Notice that the heuristic rule above supplied a little function to
compute the value of the reason. That formula was:
``three times the ratio of non\-ex\-am\-ples/ex\-am\-ples
found.''\footnote{In actuality, this would be checked to ensure that the
result lies between 0 and 1000.} Functions of this type, to compute the
rating for a reason, satisfy the same two constraints as the
left-hand-side did: the function must be very fast and it must have
no side effects. The ``intelligence'' that {\AM} exhibits in selecting which
task to work on ultimately depends on the accuracy of these local rule
evaluation formulae. Each one is so specialized that it is ``easy'' for it to
give a valid result; the range of situations it must judge is quite narrow.
Note that these little formulae were hand-written, individually, by the
author. {\AM} wasn't able to create new little reason-rating formulae.
The reason-rating function is evaluated at the moment the job is
suggested, and only the numeric result is remembered, not the
original function. In other words, we tack on a list of reasons and
associated numbers, for each job on the agenda. The agenda {\sl doesn't}
maintain copies of the reason-rating functions which gave those numbers.
This simplification is used merely to save the system some space and time.
Let's turn now from the reason-rating formulae to the reasons themselves.
Each reason supporting a newly-suggested
job is simply an English sentence (an opaque string, a token).
{\AM} cannot do much intelligent processing on these reasons.
{\AM} is not
allowed to inspect parts of it, parse it, transform it, etc. The most
{\AM} can do is compare two such tokens for equality. Of course, it
is not to hard to imagine this capability extended to permit {\AM} to
syntactically analyze such strings, or to trivially compute some sort
of ``difference'' between two given reasons.\ffootnote{It is in fact
trivial to {\sl imagine} it. Of course {\sl doing} it is quite a bit
less trivial. In fact, it probably is the toughest of all the ``open research
problems'' I'll propose.} Each reason is assumed to have some semantic
impact on the user, and is kept around partly for that purpose.
Each reason for task $\tau$ has a numeric rating (a number between 0 and 1000)
assigned to it locally by the heuristic rule which proposed $\tau$
for that reason. One global formula then combines
all the reasons' ratings into one single priority value for the task.
\subsectionbegin[2.2]{The Ratings Game}
In general, a task on the agenda list will have several reasons in
support of it. Each reason consists of an English phrase and a
numeric rating. How can a task have more than one reason? There are
two contributing factors: ({\sl i}) A single heuristic rule can have
several reasons in support of a job it suggests, and ({\sl ii}) When a rule
suggests a ``new'' task, that very same task may already exist on the
agenda, with quite distinct reasons tacked on there. In that case,
the new reason(s) are added to the already-known ones.
One global formula looks at all the ratings for the reasons, and
combines them into a single priority value for the task as a whole.
Below is that formula, in all its gory detail:
$$Worth(J) = \|\sqrt{\Sigma R↓i↑2}\,\| \times \biglp .2\cdot Worth(A) + .3\cdot Worth(F) + .5\cdot Worth(C)\bigrp$$
\noindent (where J = job to be judged = (Act A, Facet F, Concept C)
and $\{R↓i\}$ are the ratings of the reasons supporting J).
\yskip
For example, consider the job J = {\bf (Check examples of Primes)}. The
act A would be ``{\bf Check},'' which has a numeric worth of 100. The facet
F would be ``{\bf Examples},'' which has a numeric worth of 700. The concept
C would be ``{\bf Primes},'' which at the moment might have Worth of 800.
Say there were four reasons, having values 200, 300, 200, and 500.
The double lines ``$\|\ldots\|$'' indicate normalization, which means that
the final value of the square-root must be between 0 and 1, which
is done by dividing the result of the Square-root by 1000
and then truncating to 1.0 if the result exceeds unity.
In this case, we first compute
$\sqrt{200↑2 + 300↑2 + 200↑2 + 500↑2} = \sqrt{420,000}$, which is about 648.
After normalization, this becomes
0.648. The expression in large parentheses in the formula
is actually computed
as the dot-product of two vectors, namely, (Worth(A), Worth(F),
Worth(C)) and (.2 .3 .5);
in this case it is the dot-product of (100 700 800) and (.2 .3 .5),
which yields 630. This is multiplied by the normalized Square-root
value, 0.648, and we end up with a final priority rating of 408.
The four reasons each have a fairly low priority, and the total
priority of the task is therefore not great. It is, however, higher
than any single reason multiplied by 0.648. This is because there
are many {\sl distinct} reasons supporting it. The global formula
uniting these reasons' values does not simply take the largest of
them (ignoring the rest), nor does it simply add them up.
The above formula was intended originally as a first pass, an {\it ad
hoc} guess, which I expected I'd have to modify later. Since it has
worked successfully, I have not messed with it. There is no reason
behind it, no justification for taking dot-products of vectors, etc.
I concluded, and recent experiments tend to confirm, that the
particular form of the formula is unimportant; only some general
characteristics need be present:
\listbegin
\bullistentry{The priority value of a task is a monotone increasing function of
each of its reasons' ratings. If a new supporting reason is found,
the task's value is increased. The better that new reason, the
bigger the increase.}
\bullistentry{If an already-known supporting reason is re-proposed, the value of
the task is {\sl not} increased (at least, it's not increased very much).
Like humans, {\AM} is fooled whenever the same reason reappears in disguised
form.}
\bullistentry{The priority of a task involving concept C should be a monotone
increasing function of the overall worth of C. Two similar tasks
dealing with two different concepts,
each supported by the same list of reasons and reason ratings, should be
ordered by the worth of those two concepts.}
\listend
\noindent I believe that all of these criteria are absolutely essential to good
behavior of the system. Several of the experiments discussed later bear on
this question (See Section 6--2).
Note that the messy formula does incorporate all three
of these constraints. In addition, there are a few features of that
formula which, while probably not necessary or even desirable, the
reader should be informed of explicitly:
\listbegin
\bullistentry{The task's value does not depend on the order in which the reasons
were discovered. This is not true psychologically of people, but it
is a feature of the particular priority-estimating formula initially
selected.}
\bullistentry{Two reasons are either considered identical or unrelated. No
attempt is made to reduce the priority value because several of the
reasons are overlapping semantically or even just syntacticaly. This,
too, is no doubt a mistake.}
\bullistentry{There is no need to keep around all the individual reasons' rating
numbers. The addition of a new reason will demand only the knowledge
of the {\sl number} of other reasons, and the old priority value of the
task.}
\bullistentry{A task with no reasons gets an absolute zero rating. As new
reasons are added, the priority slowly increases toward an absolute
maximum which is dependent upon the overall worth of the concept and
facet involved.}
\listend
\noindent There is one topic of passing interest which should be covered here.
Each possible Act A (\eg, Fillin, Check, Apply) and each possible facet
F (\eg, Examples, Definition, Name(s)) is assigned a fixed numeric value
(by hand, by the author).
These values are used inside the formula on the last page, where it says
``Worth(A)'' and ``Worth(F).'' They are fairly resistant to change, but
certain orderings should be maintained for best results. For example, ``Examples''
should be rated higher than ``Specializations,'' or else {\AM} may whirl away
on a cycle of specialization long after the concept has been constrained
into vacuousness. As for the Acts, their precise values turned out to be even less
important than the Facets'.
Now that we've seen how to compute this priority value for any given
task, let's not forget what it's used for. The overall rating has two
functions:
\listbegin
\numlistentry[1]{These ratings determine which task
to execute next. This is not an ironclad policy: In reality, {\AM}
prints out the top few tasks, and the user has the option of
interrupting and directing {\AM} to work on one of those other tasks
instead of the very top one.}
\numlistentry[2]{Once a task is chosen, its rating determines how much
time and space {\AM} will expend on it before quitting and moving on to
a new task. The precise formulae are unimportant. Roughly, the
0--1000 rating is divided by ten to determine how much time to allow,
in cpu seconds. The rating is divided by two to determine how much
space to allow, in list cells.}
\listend
\sectionskip
\sectionbegin[3]{Heuristics Create New Concepts}
Recall that a heuristic rule's actions are of three types:
\listbegin
\bullistentry{Suggest new tasks and add them to the agenda.}
\bullistentry{Create a new concept.}
\bullistentry{Fill in some entries for a facet of a concept.}
\listend
\noindent This section discusses the second activity.
\yskip
Here is the basic idea in a nutshell:
Scattered among the ``things to do'' in the right-hand-side of a rule
are some requests to create specific new concepts. For each such request,
the heuristic rule
must specify how to construct it. At least, the rule must
specify ways of assembling enough facets of the new concept to
disambiguate it from all the other known concepts. Typically, the
rule will explain how to fill in the Definition of---or an Algorithm
for---the new concept. After executing these instructions, the new
concept will ``exist,'' and a few of its facets will be filled in, and
a few new jobs will probably exist on the agenda, indicating that {\AM} might want
to fill in certain other facets of this new concept in the near future.
\subsectionbegin[3.1]{An Illustration: Discovering Primes}
Here is a heuristic rule that results in a new concept being created:
\han2{{\it If the current task was {\bf (Fill-in examples of F)},}}
\han3{{\it and F is an operation from domain space A into range space B,}}
\han3{{\it and more than 100 items are known examples of A (in the domain of F),}}
\han3{{\it and more than 10 range items (in B) were found by applying F to these
domain items,}}
\han3{{\it and at least 1 of these range items is a distinguished member (esp:
extremum)\ffootnote{This is handled as follows:
{\AM} takes the given list of range items. It eliminates
any which are not interesting (according to Interests(B)) or extreme (an entry
on B.Exs-Bdy, the boundary examples of B).
Finally, all those extreme range items are moved to
the front of this list.
{\AM} begins walking down this list, creating
new concepts according to the rule. Sooner or later,
a timer (or a storage-space-watcher) will terminate this costly activity.
Only the frontmost
few range items on the list will have generated new concepts.
So ``especially'' really just means priority consideration.}
of B,}}
\han2{{\it Then (for each such distinguished member
`b'$\epsilon$B) create the following new concept:}}
\yskip
\ragged 1000000
\moveright 30pt \inbox{\hbox
{\vbox
{\hbox{
\bf Name: \it F\inv -of-b } \hbox{
\bf Definition: \it $\lambda (x) \biglp F(x) = b \bigrp $ } \hbox{
\bf Generalization: {\it A } } \hbox{
\bf Worth: \it $ Average \biglp Worth(A), Worth(F), Worth(B), Worth(b), $ } \hbox{
\hskip 80pt $100 \times max(10, \|Examples(B)\|)\bigrp $ } \hbox{
\bf Interest: \it Any conjecture involving both this concept and either F or F\inv
}}}}
\ragged 0
\han5{\it In case the user asks, the reason for doing this is:
``Worthwhile
investigating those A's which have an unusual F-value, namely, those
whose F-value is b''}
\yskip
\han3{\it and the total amount of time to spend
right now on all of these new
concepts is computed as: }
\han4{\it Half the remaining cpu time in the current
task's time quantum.}
\han3{\it and the total amount of space to spend right now on each of these new
concepts is computed as: }
\han4{\it 90{\rm \%} \ of the remaining space quantum for the current task.}
\noindent \rm Heuristics for the new concept are quite hard to fill in.
This was one of {\AM}'s most serious limitations, in fact (see Chapter 7).
Above, we see a trivial kind of ``heuristic schema'' or template, which gets
instantiated to provide one new, specialized heuristic about the new concept.
That new heuristic tells how to judge the interestingness of any conjecture
which crops up involving this new concept. As new conjectures get proposed,
they are evaluated by calling on a set of heuristics including this one.
Although some examples of {\bf F\inv -of-b} might be easily obtained
(or already known) at the moment of its creation, the above rule doesn't specifically
tell {\AM} how to fill in that facet. The very last line of the heuristic indicates
that a few cpu seconds may be spent on just this sort of activity: filling in facets
of the new concept which, though not explicitly mentioned in the rule, are
easy to fill in now.
Any facet {\it f} which didn't get filled in ``right now'' will
probably cause a new task to be added to the agenda, of the form: ``{\bf Fillin
facet {\it f} of concept {\it F\inv -of-b}}.''
Eventually, {\AM} would choose that task, and spend a large quantum of time working
on that single facet.
Now let's look at an instance of when this heuristic was used. At one
point, {\AM} was working on the task ``{\bf Fill-in examples of
Divisors-of}.''
This heuristic's IF-part was triggered because: Divisors-of is an
operation (from Numbers to Sets of numbers), and far more than 100 different
numbers are known, and more than 10 different sets of factors were found
altogether, and some of them were distinguished by being extreme
kinds of sets: empty-sets, singletons, doubletons and tripletons.
After its left side triggered, the right side of the heuristic rule was executed.
Namely, four new concepts were created immediately. Here is one of
them:
\yskip
\ragged 1000000
\moveright 30pt \inbox{\hbox
{\vbox
{\hbox{
\bf Name: \it Divisors-of\inv -of-Doubleton } \hbox{
\bf Definition: \it $\lambda (x)\ \biglp$ Divisors-of({\it x})
is a Doubleton$\bigrp $ } \hbox{
\bf Generalization: {\it Numbers } } \hbox{
\bf Worth: \it 100 } \hbox{
\bf Interest: \it Any conjecture involving both this concept and either}\hbox{
\it Divisors-of or Times
}}}}
\ragged 0
\yskip\rm
\noindent This is a concept representing a certain class of numbers, in fact
the numbers we call {\sl primes}. The heuristic resets a certain
variable, so that in case the user interrupts and asks {\bf Why?}, {\AM} informs him:
{\it ``This concept was created because it's worthwhile investigating
those numbers which
have an extreme divisors-of value; in this case, numbers
which have only two divisors.'' }
{\AM} was willing to spend half
the remaining quantum of time allotted to
the task {\bf ``Fillin examples of
Divisors-of''} on these four new concepts.\ffootnote{Some trivial details: One-eighth
of the remaining time is spent on each of
these 4 concepts: Numbers-with-0-divisors, Numbers-with-1-divisor,
Numbers-with-2-divisors, Numbers-with-3-divisors.
The original time/space limits were in reality about 25 cpu seconds and
800 list cells, and at the moment this heuristic was called, only
about 10 seconds and 600 cells remained, so \eg the concept Primes
was allotted only 1.2 cpu seconds to ``get off the
ground.'' This was no problem, as it used far less than that.
The heuristic rule states that each of the four new concepts may use up
to 90\%\ of the
remaining space allocation (540 out of 600 cells),
and, \eg, Primes needed only a fraction of
that initially.}\par
The heuristic rule is applicable to any operation, not just numeric
ones. For example, when {\AM} was filling in examples of
Set-Intersection, it was noticed that some pairs of sets were mapped
into the extreme kind of set Empty-set. The above rule then had {\AM}
define the concept of {\sl Disjointness}: pairs of sets having empty
intersection.
\subsectionbegin[3.2]{The Theory of Creating New Concepts}
All the heuristic rule must do is to fill in enough facets so that the
new concept is disambiguated from all the others, so that it is
``defined'' clearly. Should {\AM} pause and fill in lots of facets at
that time? After all, several pieces of information are trivial to
obtain at this moment, but may be hard to reconstruct later (\eg,
the reason why C was created). On the other hand, filling in
anything without a good reason is a bad idea (it uses up time and
space, and it won't dazzle the user as a brilliant choice of
activity).
So the universal motto of {\AM} is to fill in facets of a new concept
if---and only if---that filling-in activity will be much easier at
that moment than later on.
In almost all cases, the following facets\ffootnote{The reader may wish to
glance ahead to Section
5--2, to note the full range
of facets that any concept may possess: what their names are, and the
kind of information that is stored in each.} will be specified
explicitly in the heuristic rule, and thus will get filled in right
away: Definitions, Algorithms, Domain/range, Worth, plus a tie to
some related concept (\eg, if the new concept is a generalization of
Equality, then we can trivially fill in an entry on its
Specializations facet: ``Equality.'')
On the other hand, the following facets will {\sl not} be trivial to
fill in: Con\-jec\-tures, Ex\-am\-ples, Gene\-rali\-za\-tions,
Spe\-ciali\-za\-tions, and
In\-terest\-ing\-ness. For example, filling in the Spe\-ciali\-za\-tions
facet
of a new concept may involve creating some new concepts; finding some
entries for its Con\-jec\-tures facet may involve a great deal of
experimenting; finding some Exam\-ples of it may involve twisting its
definition around or searching. None of these is easier to do at time of creation
than any other time, so it's deferred until some reason for doing it
exists.
For each such ``time-consuming'' facet F, of the new concept C, one new
task gets added to the agenda, of the form ``{\bf Fill in entries for
facet F of concept C},'' with reasons of the form ``Because C was just
created,'' and also ``No entries exist so far on C.F''.\footnote{C.F is
an abbreviation for facet F of concept C.} Most of the
tasks generated this way will have low priority ratings, and may stay
near the bottom of the agenda until/unless they are re-suggested for
a new reason.
Using the Primes example, from the last subsection, we see that a new
task like ``{\bf Fillin specializations of Primes}'' was suggested with a
low rating, and ``{\bf Fillin examples of Primes}'' was suggested with a
mediocre\footnote{Not as low a rating as the task just mentioned. Why? Each
possible facet has a worth rating which is fixed once and for all.
As an illustration, we mention that the facet Examples is rated much
higher than Specializations. Why is this? Because looking for
examples of a concept is often a good expenditure of time, producing
the raw data on which empirical induction thrives. On the other
hand, each specialization of the new concept C would itself be a
brand new concept. So filling in entries for the Specializations facet
would be a very explosive process.} rating. The ratings of these
tasks increase later on, when the same tasks are re-proposed for new reasons.
\subsectionbegin[3.3]{Another Illustration: Squaring a number}
Let's take another simple (though not atypical) illustration of how
new concepts get created.
Assume that {\AM} has recently discovered the concept of
multiplication, which it calls ``TIMES,'' and {\AM} decides that it is very
interesting. A heuristic rule exists which says:\footnote{By glancing back
at the Primes example, in Section 2--4.3.1, you can imagine
what this rule actually looked like. There is nothing to be gained by
stretching it out in all its glory, hence I've taken the liberty
condensing it, inserting pronouns, etc.}
\han2{{\it If a newly-interesting operation F(x,y) takes a pair of N's as
arguments,}}
\han2{{\it Then create a new concept, a specialization of F,
called F-Itself, taking just one N as
argument, defined as F(x,x), with initial worth Worth(F).}}
\yskip
\noindent In the case of F = TIMES, we see that F takes a pair of numbers as
its arguments, so the heuristic rule would have {\AM} create a new
concept called TIMES-Itself, defined as TIMES-Itself(x) =
TIMES(x,x). That is, create the new concept which is the operation
of {\sl squaring a number}.
What would {\AM} do in this situation? The global list of concepts
would be enlarged to include the new atom ``TIMES-Itself,'' and the
facets of this new concept would begin to be filled in. The
following facets would get filled in almost instantly:
\yskip
\han3{{\bf NAME: {\it TIMES-Itself }}}
\han3{{\bf DEFINITIONS: {\it }}}
\han5{{\bf ORIGIN: {\it $\lambda (x,y) \biglp $TIMES.\ DEFN$(x,x,y)\bigrp $}}}
\han3{{\bf ALGORITHMS: {\it $\lambda (x) \biglp $ TIMES.\ ALG$(x,x)\bigrp $}}}
\han3{{\bf DOMAIN/RANGE: {\it Number $\→$ Number }}}
\han3{{\bf GENERALIZATIONS: {\it TIMES }}}
\han3{{\bf WORTH: {\it 600 }}}
\yskip
\noindent The name, definition, domain/range,
generalizations, and worth are specified
explicitly by the heuristic rule.
The lambda expression stored under the definition facet is an
executable LISP predicate, which accepts two arguments and then tests
them to see whether the second one is equal to TIMES-Itself of the
first argument. It performs this test by calling upon the predicate
stored under the definition facet of the TIMES concept.
Thus TIMES-Itself.Defn(4,16) will call on TIMES.Defn(4,4,16), and return
whatever value {\sl that} predicate returns (in this case, it returns True,
since 4$\times$4 does equal 16).
A trivial
transformation of this definition provides an algorithm for computing this
operation. The algorithm says to call on the Algorithms facet of the concept
TIMES. Thus TIMES-Itself.Alg(4) is computed by
calling on TIMES.Alg(4,4) and returning {\sl that} value (namely, 16).
The worth of TIMES was 600 at the moment TIMES-Itself was created,
and this becomes the worth
of TIMES-Itself.
TIMES-Itself is by definition a specialization of TIMES, so the
SPE\-CIAL\-I\-ZA\-TIONS facet of TIMES is incremented to point to this new
concept. Like\-wise, the GENERALIZATIONS facet of TIMES-Itself points
to TIMES.
Note how easy it was to fill in these facets now, but how difficult
it might be later on, ``out of context.'' By way of contrast,
the task of, \eg, filling in
{\sl Specializations} of TIMES-Itself will be no harder later on than
it is right now, so we may as well defer it until there's a good
reason for it. This task will probably be added to the agenda with so
low a priority that {\AM} will never get around to it, unless some new
reasons for it emerge.
The task {\bf ``Fill-in examples of TIMES-Itself''} is probably
worthwhile doing soon, but again it won't be any harder to do at a
later time than it is right now. So it is not done at the moment;
rather, it is added to the agenda (with a fairly high priority).
Incidentally, the reader may be interested to know that the next few
tasks {\AM} selected (in reality) were to create the inverse of this
operation (\ie, integer square-root), and then to create a new kind
of number, the ones which can be produced by squaring (\ie, perfect
squares). Perfect squares were deemed worth having around because
Integer-square-root is {\sl defined} precisely on that set of integers.
\sectionskip
\sectionbegin[4]{Heuristics Fill in Entries for a Facet}
The last two sections dealt with how a heuristic rule is able to
propose new tasks and create new concepts. This section will
illustrate how a rule can find some entries for
a specific facet of a specific concept.
\subsectionbegin[4.1]{An Illustration: ``Fill in Examples of Set-union''}
Typically, the facet/concept involved will be the one mentioned in the
current task which was chosen from the agenda.
If the task ``{\bf Fillin Examples of Set-union}'' were plucked from the agenda,
then the ``relevant'' heuristics would be those useful for filling in
entries for the Examples facet of the Set-union concept.
Here's one such relevant heuristic, attached to the concept Activity:
\han2{{\it If the current task is to fill in examples of the
activity\ffootnote{``Activity''
is a general concept which includes operations, predicates,
relations, functions, etc.}
F,}}
\han2{{\it One way to get them is to run F on randomly chosen examples of the
domain of F.}}
\noindent Of course, in the LISP implementation, this situation-action rule is
not coded quite so neatly. It would be more faithfully translated as
follows:
\han2{{\it If CURRENT-TASK matches (FILLIN EXAMPLES F$\←$anything)),}}
\han3{{\it and F isa Activity,}}
\han2{{\it Then carry out the following procedure:}}
\han3{{\it 1. Find the domain of F, and call it D;}}
\han3{{\it 2. Find examples of D, and call them E;}}
\han3{{\it 3. Find an algorithm to compute F, and call it A;}}
\han3{{\it 4. Repeatedly:}}
\han4{{\it 4a. Choose any member of E, and call it E1.}}
\han4{{\it 4b. Run A on E1, and call the result X.}}
\han4{{\it 4c. Check whether <E1,X> satisfies the definition of F.}}
\han4{{\it 4d. If so, then add <E1 $\→$ X> to the Examples facet of F.}}
\han4{{\it 4e. If not, then add <E1 $\→$ X> to the Non-examples facet of F.}}
\yskip
\noindent When
the current task is ``{\bf Fillin examples of Set-union},'' the
left-hand-side of the rule is satisfied, so the right-hand-side is
run.
Step (1) says to locate the domain of Set-union. The facet labelled
Do\-main/\-Range, on the Set-union concept, contains the entry (SET SET $\→$
SET), which indicates that the domain is a pair of sets.
That is, Set-union is an operation which accepts (as its arguments)
a pair of sets, and returns
(as its value) some new set.
Since the domain elements are sets,
step (2) says to locate examples of sets. The facet labelled
Examples, on the Sets concept, points to a list of about 30 different
sets. This includes $\{$Z$\}$,
$\{$A,B,C,D,E$\}$,
$\{$$\}$,
$\{$A,$\{\{$B$\}\}\}, \ldots$
Step (3) involves nothing more than accessing some randomly-chosen
entry on the Algorithms facet of Set-union. One such entry is a
recursive LISP function of two arguments, which halts when the first
argument is the empty set, and otherwise pulls an element out of
that set and SET-INSERT's it into the second argument, and then
recurs on the new values of the two sets. For convenience, we'll
refer to this algorithm as UNION.
We then enter the loop of Step (4). Step (4a) has us choose one pair
of our examples of sets, say the first two $\{$Z$\}$ and
$\{$A,B,C,D,E$\}$. Step
(4b) has us run UNION on these two sets. The result is $\{$A,B,C,D,E,Z$\}$.
Step (4c) has us grab an entry from the Definitions facet of
Set-union, and run it. A typical definition is this formal one:
\han3{{\bf $\lambda$ (S1 S2 S3) }}
\han4{{\bf ( AND}}
\han5{{\bf (For all x in S1, x is in S3)}}
\han5{{\bf (For all x in S2, x is in S3)}}
\han5{{\bf (For all x in S3, x is in S1 or x is in S2)}}
\han4{{\bf $\ \ $ )}}
\yskip
\noindent It is run on the three arguments S1=$\{$Z$\}$,
S2=$\{$A,B,C,D,E$\}$,
S3=$\{$A,B,C,D,E,Z$\}$. Since it returns ``True,''
we
proceed to Step (4d).
\noindent The construct <$\{$Z$\}$,
$\{$A,B,C,D,E$\} \→$
$\{$A,B,C,D,E,Z$\}$> is added to Set-un\-ion.Ex\-am\-ples.
At this stage, control returns to the beginning of the Step (4) loop.
A new pair of sets is chosen, and so on.
But when would this loop stop? Recall that each task has a time and a space
allotment (based on its priority value).
If there are many different rules all claiming to be relevant to the current task,
then each one is allocated a small fraction of those time/space quanta.
When either of these resources is exhausted,
{\AM} would break away at a ``clean'' point (just
after finishing a cycle of the Step (4) loop) and would move on to
a new heuristic rule for filling in examples of Set-union.
This concludes the demonstration that a heuristic rule really can be
executed to produce the kinds of entities requested by the current
task.
\subsectionbegin[4.2]{Heuristics Propose New Conjectures}
We saw in the sample excerpt (Chapter 2) that {\AM} occasionally notices
some unexpected relationship, and formulates it into a precise
conjecture. Below is an example of how this is done. As you might
guess from the placement of this subsection,
the mechanism is our old
friend the heuristic rule which fills in entries for certain facets.
In fact, a conjecture evolves through four stages:
\listbegin
\numlistentry[1]{A heuristic rule looks for a particular kind of relationship.
This will typically be of the form ``X is a Generalization of Y,'' or ``X is
an example of Y,'' or ``X is the same as Y,''
or ``F1.Defn(X,Y).''\ffootnote{This says that F1(X)=Y,
where F1 is an active concept {\AM} knows about.}}
\numlistentry[2]{Once found, the relationship is checked, using supporting contacts.
A great deal of empirical evidence must favor it,
and any contradictory evidence must be ``explained away'' somehow.}
\numlistentry[3]{Now it is believed, and {\AM} prints it out to the user.
It is added as a new entry to the Conjecs facet of both concepts X and Y.
It is also
added as an entry to the Examples facet of the Conjecture concept.}
\numlistentry[4]{Eventually, {\AM} will get around to the task ``{\bf Check Examples of
Con\-jec\-ture},''
or to the task ``{\bf Check Conjecs of X}.''
If {\AM} had any concepts for proving conjectures, they
would then be invoked. In the current LISP implementation, these are
absent.
Nevertheless, several ``checks'' are performed: ({\it i\/}) see if any new empirical
evidence (pro or con) has appeared recently; ({\it ii\/}) see if this conjecture can
be strengthened; ({\it iii\/}) check it for extreme cases, and modify it if necessary;
({\it iv\/}) Modify the worth ratings of the concepts involved in the conjecture.}
\listend
The left-hand-side of such a heuristic rule will be longer and more
complex than most other kinds, but the basic activities of the
right-hand-side will still be filling in an entry for a particular
facet.
The entries filled in will include:
({\it i\/}) a new example of Conjectures,
({\it ii\/}) a new entry for the Conjec facet of each concept involved in the conjecture,
({\it iii\/}) if we're claiming that
concept X is a generalization of concept Y, then ``X'' would be added to the
Generalizations facet of Y, and ``Y'' added to the Specializations facet of X,
({\it iv\/}) if X is an Example of Y,
``X'' is added to the Examples facet of Y, and ``Y'' is added to the ISA facet of X.
The right-hand-side may also involve adding new tasks to the agenda,
creating new concepts, and modifying entries of particular facets of
particular concepts. As is true of all heuristic rules,
both sides of this type of conjecture-perceiving rule
may run any little functions they want to: any functions which are quick and have
no side effects (\eg, FOR-ALL tests, PRINT functions, accesses to
a specified facet of some concept).
\subsectionbegin[4.3]{An Illustration: ``All primes except 2 are odd''}
As an illustration, here is a heuristic rule, relevant when checking
examples of any concept:
\han2{{\it If the current task is to Check Examples of X,}}
\han3{{\it and (Forsome Y) Y is a generalization of X,}}
\han3{{\it and Y has at least 10 examples,}}
\han3{{\it and all examples of Y (ignoring boundary cases) are also examples of X,}}
\han2{{\it Then print the following conjecture: X is really no more specialized than Y,}}
\han3{{\it and add it to the Examples facet of Conjectures,}}
\han3{{\it and if the user asks, inform him that the evidence for this was that
all $\|$Examples(Y)$\|$ Y's (ignoring boundary examples of Y's) turned
out to be X's as well,}}
\han3{{\it and Check the truth of this conjecture on boundary examples of Y,}}
\han3{{\it and add ``X'' to the Generalizations facet of Y,}}
\han3{{\it and add ``Y'' to the Specializations facet of X,}}
\han3{{\it and (if there is an entry in the Generalizations facet of Y) add the
following task to the agenda {\bf ``Check examples of Y''}, for the reason:
``Just as Y was no more general than X, one-of Generalizations(Y) may
turn out to be no more general than Y,'' with a rating for that reason
computed as:
$ .4\cdot \|Examples(Generalizations(Y))\| +
.3\cdot \|Examples(Y)\| +
.3\cdot Priority(Current\ task)$.}}
\yskip
\noindent Let's take a particular instance where this rule would be useful. Say
the current task is ``{\bf Check examples of Odd-primes}.'' The
left-hand-side of the rule is run, and is satisfied when the
generalization Y is the concept ``Primes.'' Let's see why this is
satisfied.
One of entries of the Generalization facet of Odd-primes is ``Primes.''
{\AM} grabs hold of the 30 examples of primes (located on the Examples
facet of Primes), and removes the ones which are tagged as boundary
examples (``2'' and ``3''). A definition of Odd-prime numbers is
obtained (Definitions facet of Odd-primes), and it is run on each
remaining example of primes (5, 7, 11, 13, 17, \ellipsis). Sure enough,
they all satisfy the definition. So all primes (ignoring boundary
cases) appear to be odd. The left-hand-side of the rule is
satisfied.
At this point, the user sees a message of the form ``Odd-primes is
really no more specialized than Primes.'' If he interrupts and asks
about it, he is told that the evidence for this was that all 30
primes (ignoring boundary examples of primes) turned out to be
Odd-primes.
Of the boundary examples (the numbers 2 and 3), only the integer
``2'' fails to be an odd-prime, so the the user is notified of the
finalized conjecture: ``All primes (other than `2') are also
odd-primes.'' This is added as an entry on the Examples facet of
the concept named `Conjectures.'
Before beginning all this, the Generalizations facet of Odd-primes
pointed to Primes. Now, this rule has us add ``Primes'' as an entry on
the Specializations facet of Odd-primes. Thus Primes is both a
generalization and a specialization of Odd-primes (to within a single
stray exception), and {\AM} will be able to treat these two concepts as
if they were merged together. They are still kept separate, however,
in case {\AM} ever needs to know precisely what the difference between
them is, or in case later evidence shows the conjecture to be
false.\ffootnote{When space is exhausted, one emergency
measure {\AM} takes is to
destructively coalesce a pair of concepts X,Y where X is both a
generalization of and a specialization of Y, even if there are a couple
``boundary'' exceptions.}\par
The final action of the right-hand-side of this rule is to propose a
new task (if there exist some generalizations of the concept Y, which
in our case is ``Primes''). So {\AM} accesses the Generalizations facet
of Primes, which is ``(Numbers).'' A new task is therefore added to the
agenda: ``{\bf Check examples of Primes},'' with an associated reason:
``Just as Primes was no more general than Odd-primes, so Numbers may
turn out to be no more general than Primes.'' The reason is rated
according to the formula given in the rule; say it gets the value
500.
To make this example a little more interesting, let's suppose that
the task ``{\bf Check examples of Primes}'' already existed on the
agenda, but for the reason ``Many examples of Primes have been found,
but never checked,'' with a rating for the reason of 100, and for the
task as a whole of 200. The global task-rating formula then assigns
the task a new overall priority of 600, because of the new, fairly
good reason supporting it.
When that task is eventually chosen, the heuristic rule pictured
above (at the beginning of this subsection) will trigger and will be run
again, with X=Primes and Y=Numbers. That is, {\AM} will be considering
whether (almost) all numbers are primes. The left-hand-side of the
heuristic rule will quickly fail, since, \eg, ``6'' is an example of
Numbers which does not satisfy the definition of Primes.
\subsectionbegin[4.4]{Another illustration: Discovering Unique Factorization}
Below is a heuristic rule which is a key agent in the process of
``noticing'' the fundamental theorem of arithmetic.\ffootnote{The unique
factorization conjecture: any positive integer {\it n} can be represented
as the product of prime numbers in precisely one way (to within
reorderings of those prime factors). Thus 28 = 2x2x7, and we don't
distinguish between the factorization (2 2 7) and (2 7 2).}
\han4{{\it If F({\rm a}) is unexpectedly a BB,}}
\han4{{\it Then maybe ($\forall $x) F(x) is a BB.}}
\noindent Below, the same rule is given in more detail.
The first conjunct on the IF-part of the heuristic rule indicates that it's
relevant to checking examples of any given operation F. A typical
example is selected at random, say F(x)=y. Then y is examined, to see
if it satisfies any more stringent properties than those specified in
the Domain/range facet of F. That is, the Domain/range facet of F
contains an entry of the form A$\→$B; so if x is an A, then all we are
guaranteed about y is that it is an example of a B. But now, this
heuristic is asking if y isn't really an example of a much more
specialized concept than B. If it is (say it's an example of a BB),
then the rest of the examples of F are examined to see if they too
satisfy this same property. If all examples appear to map from domain
set A into range set BB (where BB is much more restricted than the
set B specified originally in the Domain/range facet of F), then a
new conjecture is made: the domain/range of F is really A$\→$BB, not
A$\→$B. Here is that rule, in crisper notation:
\yskip
\han2{{\it If the current task is to Check Examples of the operation F,}}
\han3{{\it and F is an operation from domain A into range B,}}
\han3{{\it and F has at least 10 examples,}}
\han3{{\it and a typical one of these examples is ``<x$\,\→\,$y>''
(so x$\,\epsilon $A and y$\,\epsilon$ B),}}
\han3{{\it and (Forsome Specialization BB of B), y is a BB.}}
\han3{{\it and {\bf all} examples of F (ignoring boundary cases) turn out to be BB's,}}
\han2{{\it Then print the following conjecture: ``F(a) is always a BB,
not simply a B,''}}
\han3{{\it and add it to the Examples facet of Conjectures concept,}}
\han3{{\it and add ``<A $\→$ BB>'' as a new entry to the Domain/range facet of F,
replacing ``<A $\→$ B>,''}}
\han3{{\it and if the user asks, inform him that the evidence for this was that
all $\|$Examples(F)$\|$ examples of F (ignoring boundary examples) turned
out to be BB's,}}
\han3{{\it and check the truth of this conjecture by running F on boundary examples of A.}}
\yskip
\noindent Let's see how this rule was used in one instance. In Task 79 in the
sample excerpt in Chapter 2, {\AM} defined the concept Prime-times,
which was a function transforming any number {\it n} into the set of all
factorizations of {\it n} into primes. For example,
Prime-times(12)=$\{$(2 2 3)$\}$,
Prime-times(13)=$\{$(13)$\}$. The domain of F=Prime-times was the
concept Numbers. The range was Sets. More precisely, the range was
Sets-of-Bags-of-Numbers, but {\AM} didn't know that concept at that
time.
The above heuristic rule was applicable. F was Prime-times, A was
Numbers, and B was Sets. There were far more than 10 known examples
of Prime-times in action. A typical example was this one: <21 $\→$
$\{$(3,7)$\}$>. The rule now asked that $\{$(3,7)$\}$ be fed to each
specialization of Sets, to see if it satisfied any of their
definitions. The Specializations facet of Sets was acccessed, and
each concept pointed to was run (its definition was run) on the
argument ``$\{$(3,7)$\}$.'' It turned out that Singleton and
Set-of-doubletons were the only two specializations of Sets satisfied
by this example. At this moment, {\AM} had narrowed down the potential
conjectures to these two:
\listbegin
\bullistentry{{\it Prime-times(x) is always a singleton set.}}
\bullistentry{{\it Prime-times(x) is always a set of doubletons.}}
\listend
\noindent Each example of Prime-times was examined, until one was found to
refute each conjecture (for example, <8 $ \→\{$(2,2,2)$\}$> destroys
conjecture 2). But no example was able to disprove conjecture 1. So
the heuristic rule plunged forward, and printed out to the user ``A
new conjecture: Prime-times({\it n}) is always a singleton-set, not simply
a set.'' The entry <Numbers$\→$Singleton-sets> was added to the
Domain/range facet of Prime-times, replacing the old entry <Numbers$\→$Sets>.
Let's digress for a moment to discuss the robustness of the system.
What if this heuristic were to be excised? Could {\AM} still propose
unique factorization? The answer is yes, there are other ways to
notice it. If {\AM} has the concept of a Function,\ffootnote{A single-valued
relation. That is, for any domain element x, F(x) contains precisely
one member. It is never empty (\ie, undefined), nor is it ever
larger than a singleton (\ie, multiple-valued).} then a heuristic
rule like the one in the previous subsection will
cause {\AM} to ask if Prime-times is not merely a Relation, but also a
Function.
The past few sections should have convinced the reader that isolated
heuristic rules really can do all kinds of things: propose new tasks,
create new concepts, fill in entries for specific facets
(goal-driven), and look for conjectures (data-driven empirical
induction). The rules appear fairly general\ffootnote{That is, applicable in
many situations. It would be worse than useless if a rule existed
which could lead to a single discovery like ``Fibonacci series'' but
never lead to any other discoveries. The reasons for demanding
generality are not only ``fairness,'' but the insights which occur when
it is observed that several disparate concepts were all motivated by
the same general principle (\eg, ``looking for the inverse image of
extrema'').}---though that must be later verified empirically. They
are redundant in a pleasing way: some of the most ``important,''
well-known, and interesting conjectures can (apparently) be derived
in many ways. Again, we must justify this experimentally.
\sectionskip
\sectionbegin[5]{Gathering Relevant Heuristics}
Each concept has facets which (may) contain some heuristics. Some of these
are for filling in, some for checking, some for deciding
interestingness,\footnote{The reader has already seen several heuristics
useful for filling in and checking facets; here is one for judging
interestingness: an entry on the Interest facet of Compose says that
a composition A$\circ $B is more interesting if the range of B {\sl equals} the
domain of A, rather than if if they only {\sl partially}
overlap.} some for noticing new conjectures, etc.
{\AM} contains hundreds of these heuristics. In order to save time (and
to make {\AM} more rational), each heuristic should only be tried
in situations where it might apply, where it makes sense, in its
``domain of applicability.''
\subsectionbegin[5.1]{Domain of Applicability}
How is {\AM} able to zero in on the relevant heuristic rules, once a
task has been selected from the agenda?
The secret is that each heuristic rule is stored somewhere {\it a
propos} to its ``domain of applicability.'' This ``proper place'' is
determined by the first conjunct in the left-hand side of the rule.
What does this mean? Consider this heuristic:
\han1{{\it If the current task is to fill in examples of the operation F,
\hskip 14pt plus 5pt minus 4pt $\←$===}}
\han3{{\it and some examples of the domain of F are known,}}
\han1{{\it Then one way to get examples of F is to run F on randomly chosen
examples of the domain of F.}}
\yskip
\noindent This is a reasonable thing to try---but only in certain situations.
Should it be tried when the current task is to check the Worth facet
of the Sets concept? No, it would be irrational. Of course, even if
it were tried then, the left-hand side would fail very quickly. Yet
{\sl some} cpu time would have been used, and if the user were
watching, his opinion of {\AM} would decrease.\footnote{This notion of
worrying about a human user who is observing {\AM} run in real time may
appear to be quite language- and machine-dependent. An increase in speed of a couple
orders of magnitude would radically alter the qualitative appearance of {\AM}.
In Chapter 7, however, the reader will grasp how difficult it
is to objectively rate a system like {\AM}. For that reason, all measures
of judgment must be respected. Also, to the actual human being using the
system this really is one of the more important measures.}\par
That particular heuristic has a precise domain of applicability: {\AM}
should use it whenever the current task is to fill in examples of an
operation, and only in those kinds of situations.
The key observation is that a heuristic typically applies to {\sl all
examples of a particular concept C}. In the case we were
considering, C=Operation. Intuitively, we'd like to tack that
heuristic onto the Examples facet of the concept Operation, so it
would only ``come to mind'' in appropriate situations. This is
precisely where the heuristic rule {\sl is} stored.
Initially, the author identified the proper concept C and facet F for
each heuristic H which {\AM} possessed, and tacked H onto C.F.\ffootnote{Recall
that C.F is an abbreviation for facet F of concept C.} This was all
preparation, completed long before {\AM} started up. Each heuristic was
tacked onto the facet which uniquely indicates its domain of
applicability. The first conjunct of the IF-part of each heuristic
indicates where it is stored and where it is applicable. Notice the
little arrow ($\←$===) pointing to that conjunct above.\footnote{In the LISP
implementation, these first conjuncts are omitted, since the
{\sl placement} of a heuristic serves the same purpose as if it had
some ``pre-preconditions'' (like these first conjuncts) to determine
relevance quickly.}\par
While {\AM} is running, it will choose a task dealing with, say, facet F
of concept C. {\AM} must quickly locate the heuristic rules which are
relevant to satisfying that chosen task. {\AM} simply locates all
concepts which claim C as an example. If the current task were
``{\bf Check the Domain/range of Union$\circ$Union},''\footnote{ This operation is
defined as: $Union\circ Union(x,y,z) =↓{df} (x \union y) \union z$.
It accepts three
sets as arguments, and returns a new set as its value.} then C
would be Union$\circ$Union. Which concepts claim C as an example? They
include Compose-with-Self, Composition, Operation, Active,
Any-concept, and Anything. {\AM} then collects the heuristics tacked
onto facet F (in this case, F is Domain/range) of each of those
concepts. All such heuristics will be relevant. In the current case,
some relevant heuristics might be garnered from the Domain/range
facet of the concept Operation. Any heuristic which can deal with
the Domain/range facet of {\sl any} operation can certainly deal with
Union$\circ$Union's Domain/range. A typical rule on
Operation.Domain/range.Check\footnote{The ``Check'' subfacet of the
``Domain/range'' facet of the ``Operation'' concept.} would be this
one:
\han2{{\it If a Dom/ran entry of F is of the form <D D D$\ldotss$D $\→$ R>,
\han3{{\it and R is a generalization of D,}}}}
\han2{{\it Then test whether the range might not be simply D.}}
\yskip
\noindent Suppose one entry on Union$\circ$Union.Dom/ran was ``<Nonempty-sets
Nonempty-sets Nonempty-sets $\→$ Sets>.'' Then this last heuristic rule
would be relevant, and would have {\AM} ask the plausible question: Is
the union of three nonempty sets always nonempty? The answer is
affirmative, empirically, so {\AM} modifies that Domain/range entry for
Union$\circ$Union. {\AM} would ask the same question for
Intersect$\circ$Intersect. Although the answer then would be ``{\sl No},''
it's still a rational inquiry. If {\AM} called on this heuristic rule
when the current task were ``{\bf Fillin specializations of Bags},'' it
would clearly be an irrational act. The domain of applicability of
the rule is clear, and is precisely fitted to the slot where the rule
is stored (tacked onto Operation.Domain/range).
To recap the basic idea: when dealing with a task ``{\bf Do act A on facet
F of concept C},'' {\AM} must locate all the concepts X claiming C as an
example. {\AM} then gathers the heuristics tacked onto X.F.A, for each
such general concept X. All of them---and only they---are relevant
to satisfying that task.
So the whole problem of locating relevant heuristics has been reduced
to the problem of efficiently finding all concepts of which C is an
example (for any given concept C). This process is called ``{\sl rippling
away from C in the ISA direction},'' and forms the subject of the
next subsection.
\subsectionbegin[5.2]{Rippling}
Given a concept C, how can {\AM} find all the concepts which claim C as
an example?
The most obvious scheme is to store this information explicitly. So
the Examples facet of C would point to all known examples of C, and
the Isa facet of C would point to all known concepts claiming C as
one of their examples. Why not just do this?\ffootnote{This is
the implementation chosen by several projects, \eg MOLGEN
[Stefik 78].} Because one can substitute a
modest amount of processing time (via chasing links around) for the
vast amount of storage space that would be needed to have ``everything
point to everything.''
Each facet contains only enough pointers so that the entire graph of
Exs/Isa and Spec/Genl links could be {\sl reconstructed} if needed. Since
``Genl''
is a transitive relation, {\AM} can compute that Numbers is a
generalization of Mersenne-primes, if the facet Mersenne-primes.Genl
contains the entry ``Odd-primes,'' and Odd-primes.Genl contains a
pointer to ``Primes,'' and Primes.Genl points to ``Numbers.'' This kind
of ``{\sl rippling}'' activity is used to efficiently locate all concepts
related to a given one X. In particular, {\AM} knows how to ``ripple
upward in the Isa direction,'' and quickly\footnote{With about 200 known
concepts, with each Isa facet and each Genl facet pointing to about 3
other concepts, about 25 links will be traced along in order to
locate about a dozen final concepts, each of which claims the given
one as an example. This whole rippling process, tracing 25 linkages,
uses less than .01 cpu seconds, in compiled Interlisp, on a KI-10
type PDP-10.} locate all concepts which claim X as one of their
examples.
It turns out that {\AM} cannot simply call for X.Isa, then the Isa
facets of those concepts, etc., because Isa is not transitive.\footnote{If x
isa y, and y isa z, then x is (generally) {\it NOT} a z. This is due to the
intransitivity of ``member-of.'' Generalization is transitive, on the
other hand, because ``subset-of'' is transitive.} For the interested
reader, the algorithm {\AM} uses to collect Isa's of X is given below:\footnote{
For the {\it very} interested reader, it is explained in great detail
in the permanently-archived file RIPPLE[dis,dbl] at SAIL. Copies are available
from the author.}
\listbegin
\numlistentry[1]{{\it All generalizations of the given concept X are located. {\AM}
accesses X.Genl, then the Genl facets of {\bf those} concepts, etc.}}
\numlistentry[2]{{\it The ``Isa'' facet of each of those concepts is accessed.}}
\numlistentry[3]{{\it {\AM} locates all generalizations of these newly-found higher-level
concepts. This is the list of all known concepts which claim X as
one of their examples.}}
\listend
\noindent In regular form, one might express this rippling recipe more compactly as:
$Isas(x) \ \ =↓{df} \ \ Genl↑*(Isa(Genl↑*(x)))$.
\subsectionbegin[5.3]{Ordering the Relevant Heuristics}
Now that all these relevant heuristics have been assembled, in what
order should {\AM} execute them?\ffootnote{The discussion below assumes that the
heuristics don't interact with each other; \ie, that each one may
act independently of all others; \ie, they are not `strongly coupled'.
The validity of this simplification
is tested empirically (see Chapter 6) and discussed
theoretically (see Chapter 7) later.} It is important to
note that the heuristics tacked onto very general concepts will be
applicable frequently, yet will not be very powerful. For example,
here is a typical heuristic rule which is tacked onto the Examples
facet of the very general concept Any-concept:
\han3{{\it If the current task is to fill in examples of any concept X,}}
\han3{{\it Then one way to get them is to symbolically
instantiate\footnote{``Symbolic
instantiation'' is a euphemism for a bag of tricks which transform a
declarative definition of a concept into particular entities
satisfying that definition. The only constraint on the tricks is that
they not actually {\sl run} the definition. One such trick might be:
if the definition is recursive, merely find some entity that
satisfies the base step. {\AM}'s symbolic instantiation
tricks are too hand-crafted to be of great
interest, hence this will not be covered any more deeply here.
The interested reader is directed to
the pioneering work by
[Lombardi & Raphael 64], or the more recent literature on these techniques
applied to automatic program verification (\eg, [Moore 75],
[Balzer 78]).} a definition of X.}}
\yskip
\noindent It takes a tremendous amount of inference to squeeze a couple awkward
examples of Intersect$\circ$Intersect out of that concept's definition.
Much time could be wasted doing so.\footnote{Incidentally, this illustrates
why no single heuristic should be allowed to monopolize the
processing of any one task.}\par
Just as general heuristics are weak but often relevant, specific
heuristics are powerful but rarely relevant. Consider this heuristic
rule, which is attached to the very specific concept
Compose-with-Self:
\han3{{\it If the current task is to fill in examples of the composition F$\circ$F,}}
\han3{{\it Then include any fixed-points of F.}}
\yskip
\noindent For example, since Intersect($\phi$,X) equals $\phi$, so must
Intersect$\circ$Intersect($\phi$,X,Y).\footnote{$\phi$ (read ``phi'')
is another name for
the empty set, also written $\{\}$. This last sentence thus says that
since $\{\}\inter X = \{\}$, then $(\{\} \inter X)\inter Y$ must
also equal $\{\}$.} Assuming that such examples exist already on Intersect.Examples,
this heuristic will fill in a few examples of Intersect$\circ$Intersect with
essentially no processing required. Of course the domain of
applicability of this heuristic is minuscule.
As we expected, the narrower its domain of applicability, the more
powerful and efficient a heuristic is, and the less frequently it's
useful. Thus in any given situation, where {\AM} has gathered many
heuristic rules, it will probably be best to execute the most
specific ones first, and execute the most general ones last.
Below are summarized the three main points that make up {\AM}'s scheme
for finding relevant heuristics in a ``natural'' way and then using
them:
\listbegin
\bullistentry{Each
heuristic is tacked onto the most general concept for which
it applies: it is given as large a domain of applicability as
possible. This will maximize its generality, but leave its power
untouched. This brings it closer to the ``ideal'' tradeoff point
between these two quantities.}
\bullistentry{When the current task deals with concept C, {\AM} ripples away from C
and quickly locates all the concepts of which C is an example. Each
of them will contain heuristics relevant to dealing with C.}
\bullistentry{{\AM} then applies those heuristics in order of increasing
generality. You may wonder how {\AM} orders the heuristics by
generality. It turns out that the rippling process automatically
gathers heuristics in order of increasing generality. In the LISP
system, each rule is therefore executed as soon as it's found. So {\AM}
nevers wastes time gathering heuristics it won't have time to
execute.}
\listend
\sectionskip
\sectionbegin[6]{{\AM}'s Starting Heuristics}
This section will briefly characterize the collection of
242 heuristic rules which {\AM} was originally given.
A complete listing of those rules is found in Appendix 2;
the rule numbers below refer to the numbering given in that appendix.
\subsectionbegin[6.1]{Heuristics Grouped by the Knowledge They Embody}
Many heuristics embody the belief that mathematics is an empirical inquiry.
That is, one approach to discovery is simply to perform experiments,
observe the results, thereby gather statistically significant amounts of data,
induce from that data some new conjectures or new concepts worth isolating,
and then repeat this whole process again.
Some of the rules which capture this spirit are numbers
21, 43--57, 91, 136--139, 146--148, 153--154, 212--216, 225, and 241.
As one might expect, most of these are ``Suggest'' type rules.
They indicate plausible moves for {\AM} to make, promising new tasks to
try, new concepts worth studying.
Almost all the rest are ``Fillin'' type rules, providing empirical methods
to find entries for a specified facet.
Another large set of heuristics is used to embody---or to modify---what
can be called ``focus of attention.'' When should {\AM} keep on the same track,
and when not? The first
rules expressing varying nuances of this idea are numbers
1--5. The last such rules are numbers 209--216.
Some of these rules are akin to goal-setting mechanisms (\eg, rule
141).
In addition, many of the ``Interest'' type rules have some relation to keeping
{\AM} interested in recently-chosen concepts (or: in concepts related to them,
\eg by Analogy, by Genl/Spec, by Isa/Exs,\ellipsis).
The remaining ``Interest'' rules are generally some re-echoing of the following
notion: X is interesting if F(X) has an unexpected (interesting) value.
For example, in rule
26,
``F(X)'' is just ``Generalizations of X.''
In slightly more detail, the principle characteristics of
interestingness are:
\listbegin
\bullistentry{{\it symmetry (\eg, in an expanding analogy)}}
\bullistentry{{\it coincidence (\eg, in a concept being re-discovered often)}}
\bullistentry{{\it appropriateness (\eg, in choosing an operation H so that
G$\circ$H will have nicer Domain/Range characteristics than G itself did)}}
\bullistentry{{\it recency (see the previous paragraph on focus of attention)}}
\bullistentry{{\it individuality (\eg, the first entity observed which
satisfies some property)}}
\bullistentry{{\it usefulness (\eg, there are many conjectures involving it)}}
\bullistentry{{\it association (\ie, the given concept is related to an
interesting one)}}
\listend
One group of heuristic rules embeds syntactic tricks for ge\-ne\-ra\-liz\-ing
definitions (Lisp predicates), specializing them, instantiating them,
symbolically evalu\-at\-ing them, inverting them, rudimentarily analyzing them,
etc. For example, see rules
31 and 89.
Some rules serve other syntactic functions, like ensuring that
various limits aren't exceeded (\eg, rule 15),
that the format for each facet is adhered
to (\eg, rule 16),
that the entries on each facet are used as they are meant to be
(\eg, rules 9 and 59), etc.
Many of the ``Check'' type heuristics fall into this category.
Finally, {\AM} possesses a mass of miscellaneous rules which evade categorization.
See, \eg, rules 185 and 236. These range from genuine math
heuristics (rules which lead to discovery frequently) to simple data management
hacks.
\subsectionbegin[6.2]{Heuristics Grouped by How Specific They Are}
Another dimension of distribution of heuristics, aside from the above
{\sl functional}
one, is simply that of how high up in the Genl/Spec tree they are located.
The table below summarizes how the rules were distributed in that tree:
\vskip 2.5in
Here is a key to the column headings:
\parindent 0pt \bf
LEVEL: {\it How far down the Genl/Spec tree of concepts we are looking.}
\eg: {\it A sample concept at that level.}
$\#$ Con's: {\it The total number of concepts at that level.}
$\#$ w/Heur: {\it How many of them have {\sl some} heuristics.}
$\#$ Heurs: {\it The total number of heuristics attached to concepts at that level.}
Avg: {\it The mean number of heuristics per concept, at that level.}
Avg w/Heur: {\it ($\#$ Heurs) / ($\#$ w. Heurs) }
$\#$ Fillin: {\it Total number of ``Fillin'' type heuristics at that level.}
$\#$ Sugg: {\it Total number of ``Suggest'' type heuristics at that level.}
$\#$ Check: {\it Total number of ``Check'' type heuristics at that level.}
$\#$ Int: {\it Total number of ``Interestingness'' type heuristics at that level.}
\yyskip
\tenpoint\rm
The heuristic rules are seen {\sl not} to be distributed uniformly, homogeneously
among all the initial concepts. The extent of this skewing was not realized
by the author until the above table was constructed.
A surprising proportion of rules are attached to the very general concepts.
The top 10\%\ of the concepts contain 73\%\ of all the heuristics.
One notable exception is the ``Interest'' type heuristics: they seem more evenly
distributed throughout the tree of initial concepts.
This tends to suggest that future work on providing ``meta-heuristics'' should
concentrate on how to automatically synthesize those Interest heuristics for
newly-created concepts.
\worldend